A Survey of POMDP Solution Techniques

نویسنده

  • Kevin P. Murphy
چکیده

One of the goals of AI is to design an agent1 which can interact with an environment so as to maximize some reward function. Control theory addresses the same problem, but uses slightly different language: agent = controller, environment = plant, maximizing reward = minimizing cost. Control theory is mainly concerned with tasks in continuous spaces, such as designing a guided missile to intercept an airplane in minimum expected time, whereas AI is mainly concerned with tasks in discrete spaces, such as designing a program to play bridge to maximize the chance of winning. Nevertheless, AI and control theory have much in common [DW91], and some problems, such as designing a mobile robot to perform household chores, will require techniques from both fields. When designing agents that can act under uncertainty, it is convenient to model the environment as a POMDP (Partially Observable Markov Decision Process, pronounced “pom-dp”). At (discrete) time step t, the environment is assumed to be in some state Xt. The agent then performs an action (control) At, whereupon2 the environment (stochastically) changes to a new state Xt+1. The agent doesn’t see the environment state, but instead receives an observation Yt, which is some (stochastic) function of Xt. (If Yt = Xt, the POMDP reduces to a fully observed MDP.) In addition, the agent receives a special observation signal called the reward, Rt. The POMDP is characterized by the state transition function P (Xt+1|Xt, At), the observation function P (Yt|Xt, At−1), and the reward function E(Rt|Xt, At−1). The goal of the agent is to learn a policy π which maps the observation history (trajectory)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

POMDP solution methods

This is an overview of partially observable Markov decision processes (POMDPs). We describe POMDP value and policy iteration as well as gradient ascent algorithms. The emphasis is on solution methods that work directly in the space of policies.

متن کامل

Policy optimization by marginal-map probabilistic inference in generative models

While most current work in POMDP planning focus on the development of scalable approximate algorithms, existing techniques often neglect performance guarantees and sacrifice solution quality to improve efficiency. In contrast, our approach to optimizing POMDP controllers by probabilistic inference and obtaining bounded on solution quality can be summarized as follows: (1) re-formulate POMDP pla...

متن کامل

Finding Optimal POMDP Controllers Using Quadratically Constrained Linear Programs

Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One promising approach is based on representing POMDP policies as finite-state controllers. This method has been used successfully to address the intractable memory requirements of POMDP algorithms. We illustrate some fundamental theoretical limitations of existing techn...

متن کامل

A specialised POMDP form and algorithm for clinical patient management

Partially observable Markov decision processes (POMDPs) have recently been suggested as a suitable model to formalising the planning of clinical patient management over a prolonged period of time. However, practical application of POMDP models is hampered by the computational complexity of associated solution methods. It is argued that the full generality of POMDPs is not needed to support many...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000